Skip to content

[GR-67169] Address JFR follow-up review comments#13440

Open
graalvmbot wants to merge 4 commits intomasterfrom
GR-67169-pr2
Open

[GR-67169] Address JFR follow-up review comments#13440
graalvmbot wants to merge 4 commits intomasterfrom
GR-67169-pr2

Conversation

@graalvmbot
Copy link
Copy Markdown
Collaborator

No description provided.

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 29, 2026
@thomaswue
Copy link
Copy Markdown
Member

@roberttoyonaga Hi Robert! This should address some of the follow-up review comments I received for the emergency dump changes. It now also narrows the semantics to do the dump only if the OOM exception is not caught.

Copy link
Copy Markdown
Collaborator

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomaswue! I've left a few comments below.

It now also narrows the semantics to do the dump only if the OOM exception is not caught.

This requires the user to set ReportFatalErrorOnOutOfMemoryError or ExitOnOutOfMemoryError. Does it make sense to attempt an emergency dump if an uncaught OOME reaches JavaThreads.dispatchUncaughtException?

}

@Uninterruptible(reason = "Locking without transition requires that the whole critical section is uninterruptible.")
public int writePreviousEpoch(JfrChunkWriter writer) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to duplicate int write(JfrChunkWriter writer, boolean flushpoint) here?
If so, maybe we can change write to writeCurrentEpoch(JfrChunkWriter writer)

epochData.buffer = JfrBufferAccess.allocate(JfrBufferType.C_HEAP);
}
@Uninterruptible(reason = "Locking without transition and result is only valid until epoch changes.", callerMustBe = true)
private long getSymbolIdFromTemporaryBuffer(Pointer buffer, UnsignedWord length, boolean previousEpoch) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization to avoid needing to malloc for duplicates!

Comment on lines +191 to +195
/*
* Threads only need to be registered once per epoch. However, the cached epoch id alone is
* not sufficient because repository resets at stop/emergency cleanup can wipe the current
* epoch state without clearing the virtual thread's cached id.
*/
Copy link
Copy Markdown
Collaborator

@roberttoyonaga roberttoyonaga Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch. If JFR is stopped then started again, the epoch ID and jfrEpochId of surviving threads may still match - incorrectly bypassing re-registration.
I don't think "emergency cleanup" is a problem anymore since you are removing that code in this PR.

However, the purpose of isVirtualThreadAlreadyRegistered is to have a fast path that avoids taking the lock if the vthread is already registered. Hotspot does this too. Introduced in #9570. This correction negates that benefit since we need to take the lock.

Maybe a solution is to bump the epoch with JfrTraceIdEpoch.getInstance().changeEpoch() in the JfrBeginRecordingOperation VM operation.

}

if (SubstrateGCOptions.ExitOnOutOfMemoryError.getValue()) {
if (LibC.isSupported()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove dumpJfrOnOutOfMemoryError() from here now?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to match HotSpot behavior. Our tests and research indicate that HotSpot does not do that dump when -XX:+ExitOnOutOfMemoryError is enabled. There is however indeed a general design question how close we want to match HotSpot's behavior. Or if we want to provide additional functionality.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with matching hotspot as closely as possible. Hotspot also does a JFR emergency dump upon VM crashes in VMError::report_and_die. I wonder if it's possible to do an emergency dump from VMError.shouldNotReachHere to achieve something similar. Although, it may not be possible since emergency dumps require a safepoint and we don't know what state the VM is in.

On the other hand, if we are okay not exactly matching Hotspot, I do think it could be useful generating an emergency dump if ExitOnOutOfMemoryError or if an OOME error goes uncaught by the application code. The user has already included JFR in the image build and JFR would already need to be running anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we might want to offer more options than HotSpot, at least if it does not make the implementation overly complex. @christianhaeubl What is your opinion on the longer term plans?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can offer additional options. However by default, I think Native Image should behave in the same way as HotSpot as differences can cause problems for people that want to adopt Native Image.

I wonder if it's possible to do an emergency dump from VMError.shouldNotReachHere to achieve something similar

It should be possible, but it will definitely require some work as we would want to skip JFR emergency dumping in some cases (e.g., the crash corrupted some VM-internal data structure, a recursive crash happens during JFR emergency dumping, the process is out-of-memory on the OS-level). As HotSpot has pretty much the same problem, I assume that they already implemented some heuristic.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However by default, I think Native Image should behave in the same way as HotSpot as differences can cause problems for people that want to adopt Native Image

Okay, let's match Hotspot and not dump on ExitOnOutOfMemoryError or uncaught OOME.

As HotSpot has pretty much the same problem, I assume that they already implemented some heuristic.

Yes, that's right. Hotspot does a bunch of things to try to increase the likelihood of the dump succeeding on VM crash, but acknowledges it's still only a best effort. We can decide later whether to do this in a future enhancement.

Copy link
Copy Markdown
Collaborator

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomaswue! Please see my comment above about matching Hotspot.

I do think it would be useful to users if we extending the conditions in which we dump, but I understand the value in matching hotspot as closely as possible. Overall, I am okay with either decision.

}

@Uninterruptible(reason = "Locking without transition and result is only valid until epoch changes.", callerMustBe = true)
public long getSymbolId(String imageHeapString, boolean previousEpoch, boolean replaceDotWithSlash) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that getSymbolId(String imageHeapString) is only called from JfrMethodRepository now. JfrTypeRepository only calls getSymbolId(Pointer buffer). Is this because, with Crema, not all type Strings will be on the image heap, but method names will be?

Copy link
Copy Markdown
Collaborator

@roberttoyonaga roberttoyonaga May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it doesn't matter since the JfrSymbolRepository no longer pins any objects in RawStrucutres. The behaviour is the same regardless of whether the String is on the image heap (we always convert to a native utf-8 buffer instead of pinning a String object). So I think that the parameter name imageHeapString can be misleading. And we should no longer have the assert on line 95.

Additionally, if this is true, we can probably optimize things by calling JfrSymbolRepository.getSymbolId(String imageHeapString, boolean previousEpoch, boolean replaceDotWithSlash) directly from JfrTypeRepository instead of converting to a buffer there first. This would help us bypass malloc on the hot path and reduce code duplication. ie I think we can get rid of JfrTypeRepository.getSymbolId(String symbol, boolean previousEpoch, boolean replaceDotWithSlash)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants